Large-scale genealogical information extraction from handwritten Quebec parish records
نویسندگان
چکیده
This paper presents a complete workflow designed for extracting information from Quebec handwritten parish registers. The acts in these documents contain individual and family highly valuable genetic, demographic social studies of the population. From an image records, our is able to identify extract personal information. divided into successive steps: page classification, text line detection, recognition, named entity recognition act detection classification. For all steps, different machine learning models are compared. Once extracted, validation rules by experts then applied standardize extracted ensure its consistency with type (birth, marriage, death). step reject records that considered invalid or merged. full has been used process over two million pages registers 19-20th centuries. On sample comprising 65% registers, 3.2 were recognized. Verification birth death this shows 74% them valid. These will be integrated BALSAC database linked together recreate genealogical relations at large scale.
منابع مشابه
Information Extraction from Echocardiography Records
Electronic health records are a rich source for medical information. However, large parts of clinical diagnosis reports are in textual form and are therefore not per se usable for statistical evaluations. To transform the information from an unstructured into a structured form is the goal of medical language processing. In this paper we want to propose an approach for the creation of a training...
متن کاملInformation Extraction from Historical Semi-Structured Handwritten Documents
In this paper, we describe our approach to extract salient events such as birth and death records from historical French parish documents that contain free-form handwritten text. The challenges posed by these documents to the current state of the art in handwriting recognition and information extraction go well beyond the generic challenges in recognizing handwritten text such as style variatio...
متن کاملHow to improve information extraction from German medical records
Vast amounts of medical information are still recorded as unstructured text. The knowledge contained in this textual data has a great potential to improve clinical routine care, to support clinical research, and to advance personalization of medicine. To access this knowledge, the underlying data has to be semantically integrated an essential prerequisite to which is information extraction from...
متن کاملData-Driven Information Extraction from Chinese Electronic Medical Records
OBJECTIVE This study aims to propose a data-driven framework that takes unstructured free text narratives in Chinese Electronic Medical Records (EMRs) as input and converts them into structured time-event-description triples, where the description is either an elaboration or an outcome of the medical event. MATERIALS AND METHODS Our framework uses a hybrid approach. It consists of constructin...
متن کاملContext Related Extraction of Conceptual Information from Electronic Health Records
This paper discusses some language technologies applied for the automatic processing of Electronic Health Records in Bulgarian, in order to extract multi-layer conceptual chunks from medical texts. We consider an Information Extraction view to text processing, where semantic information is extracted using predefined templates. At the first step the templates are filled in with information about...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal on Document Analysis and Recognition
سال: 2023
ISSN: ['1433-2833', '1433-2825']
DOI: https://doi.org/10.1007/s10032-023-00427-w